{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Forecasting Simple Time-Series"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example shows how to estimate time-series that are relatively constant or have a defined trend, but other than that do not have any patterns over time. If there are other patterns in the data, such as seasonality, see Forecasting Complex Time-Series."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load in the Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll load in the data with `pandas`, which should be review."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_excel('Sales COGS for Lab.xlsx')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Unnamed: 0
\n",
"
2017-12-31 00:00:00
\n",
"
2018-12-31 00:00:00
\n",
"
2019-12-31 00:00:00
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Sales
\n",
"
4500
\n",
"
5200
\n",
"
6100
\n",
"
\n",
"
\n",
"
1
\n",
"
Cost of Goods Sold
\n",
"
1200
\n",
"
1700
\n",
"
1900
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 2017-12-31 00:00:00 2018-12-31 00:00:00 \\\n",
"0 Sales 4500 5200 \n",
"1 Cost of Goods Sold 1200 1700 \n",
"\n",
" 2019-12-31 00:00:00 \n",
"0 6100 \n",
"1 1900 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we should use Sales, Cost of Goods sold as the index. Load in by setting the index column."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
2017-12-31
\n",
"
2018-12-31
\n",
"
2019-12-31
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Sales
\n",
"
4500
\n",
"
5200
\n",
"
6100
\n",
"
\n",
"
\n",
"
Cost of Goods Sold
\n",
"
1200
\n",
"
1700
\n",
"
1900
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 2017-12-31 2018-12-31 2019-12-31\n",
"Sales 4500 5200 6100\n",
"Cost of Goods Sold 1200 1700 1900"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_excel('Sales COGS for Lab.xlsx', index_col=0)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that looks better."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For an effective plot, we will need to transpose the data, so that the dates are the index (x-axis on plot), and the data types are columns (series on plot). Thankfully this is as simple as `df.T`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"\n",
"df.T.plot.line()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecast Using Most Recent Value\n",
"\n",
"This is the simplest forecast, just keep it the same as it was. \n",
"\n",
"Right now we have the dates as the columns. So access the columns and take the max to find the latest date."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2019-12-31 00:00:00')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"last_date = df.columns.max()\n",
"last_date"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now select the values which have the latest date as the forecast"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $6,100 and for COGS is $1,900\n"
]
}
],
"source": [
"fcst_sales = df.loc['Sales'][last_date]\n",
"fcst_cogs = df.loc['Cost of Goods Sold'][last_date]\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecast Using Average\n",
"\n",
"We have already seen how to take averages of `pandas` `Series`:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $5,267 and for COGS is $1,600\n"
]
}
],
"source": [
"fcst_sales = df.loc['Sales'].mean()\n",
"fcst_cogs = df.loc['Cost of Goods Sold'].mean()\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecast Using Trend\n",
"\n",
"There are two methods to forecast using the trend."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trend Method 1: By Regression\n",
"\n",
"We will estimate the following regression model:\n",
"$$y_t = a + \\beta t + \\epsilon_t$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create DataFrame with $t$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we need to create a `DataFrame` which has a column for the $y$ and a column for the $t$:.\n",
"\n",
"To do this, first we can create a `DataFrame` from the `Series` we want to forecast."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Sales
\n",
"
\n",
" \n",
" \n",
"
\n",
"
2017-12-31
\n",
"
4500
\n",
"
\n",
"
\n",
"
2018-12-31
\n",
"
5200
\n",
"
\n",
"
\n",
"
2019-12-31
\n",
"
6100
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sales\n",
"2017-12-31 4500\n",
"2018-12-31 5200\n",
"2019-12-31 6100"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = pd.DataFrame(df.loc['Sales'])\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use `reset_index(drop=True)` to get rid of the date index."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Sales
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
4500
\n",
"
\n",
"
\n",
"
1
\n",
"
5200
\n",
"
\n",
"
\n",
"
2
\n",
"
6100
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sales\n",
"0 4500\n",
"1 5200\n",
"2 6100"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = for_fcst_df.reset_index(drop=True)\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now use `reset_index()` without the `drop=True` to get this new 0, 1, 2 index as a column."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" t Sales\n",
"0 0 4500\n",
"1 1 5200\n",
"2 2 6100"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = for_fcst_df.rename(columns={'index': 't'})\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's wrap this all up in a function as we'll need to use this for COGS as well, and for the CAGR approaches."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
t
\n",
"
Cost of Goods Sold
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
1200
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
1700
\n",
"
\n",
"
\n",
"
2
\n",
"
2
\n",
"
1900
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" t Cost of Goods Sold\n",
"0 0 1200\n",
"1 1 1700\n",
"2 2 1900"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def for_forecast_df_from_orig_df(orig_df, series_name):\n",
" \"\"\"\n",
" From a DataFrame where index is name of series to be forecasted and columns are time periods, create a \n",
" DataFrame with two columns, t in periods and the value to be forecasted, and rows are time periods.\n",
" \"\"\"\n",
" for_fcst_df = pd.DataFrame(orig_df.loc[series_name])\n",
" for_fcst_df = for_fcst_df.reset_index(drop=True).reset_index()\n",
" for_fcst_df = for_fcst_df.rename(columns={'index': 't'})\n",
" return for_fcst_df\n",
"\n",
"for_forecast_df_from_orig_df(df, 'Cost of Goods Sold')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now Run Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This should be review from the cost of equity exercise."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\users\\admin\\.virtualenvs\\fin-model-course-gvj3lsuv\\lib\\site-packages\\numpy\\core\\fromnumeric.py:2495: FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.\n",
" return ptp(axis=axis, out=out, **kwargs)\n",
"c:\\users\\admin\\.virtualenvs\\fin-model-course-gvj3lsuv\\lib\\site-packages\\statsmodels\\stats\\stattools.py:71: ValueWarning: omni_normtest is not valid with less than 8 observations; 3 samples were given.\n",
" \"samples were given.\" % int(n), ValueWarning)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
Sales
R-squared:
0.995
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.990
\n",
"
\n",
"
\n",
"
Method:
Least Squares
F-statistic:
192.0
\n",
"
\n",
"
\n",
"
Date:
Tue, 19 Nov 2019
Prob (F-statistic):
0.0459
\n",
"
\n",
"
\n",
"
Time:
13:28:25
Log-Likelihood:
-15.816
\n",
"
\n",
"
\n",
"
No. Observations:
3
AIC:
35.63
\n",
"
\n",
"
\n",
"
Df Residuals:
1
BIC:
33.83
\n",
"
\n",
"
\n",
"
Df Model:
1
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
const
4466.6667
74.536
59.927
0.011
3519.602
5413.731
\n",
"
\n",
"
\n",
"
t
800.0000
57.735
13.856
0.046
66.407
1533.593
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Omnibus:
nan
Durbin-Watson:
3.000
\n",
"
\n",
"
\n",
"
Prob(Omnibus):
nan
Jarque-Bera (JB):
0.531
\n",
"
\n",
"
\n",
"
Skew:
-0.707
Prob(JB):
0.767
\n",
"
\n",
"
\n",
"
Kurtosis:
1.500
Cond. No.
2.92
\n",
"
\n",
"
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Sales R-squared: 0.995\n",
"Model: OLS Adj. R-squared: 0.990\n",
"Method: Least Squares F-statistic: 192.0\n",
"Date: Tue, 19 Nov 2019 Prob (F-statistic): 0.0459\n",
"Time: 13:28:25 Log-Likelihood: -15.816\n",
"No. Observations: 3 AIC: 35.63\n",
"Df Residuals: 1 BIC: 33.83\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"const 4466.6667 74.536 59.927 0.011 3519.602 5413.731\n",
"t 800.0000 57.735 13.856 0.046 66.407 1533.593\n",
"==============================================================================\n",
"Omnibus: nan Durbin-Watson: 3.000\n",
"Prob(Omnibus): nan Jarque-Bera (JB): 0.531\n",
"Skew: -0.707 Prob(JB): 0.767\n",
"Kurtosis: 1.500 Cond. No. 2.92\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import statsmodels.api as sm\n",
"\n",
"model = sm.OLS(for_fcst_df['Sales'], sm.add_constant(for_fcst_df['t']), hasconst=True)\n",
"results = model.fit()\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now get the intercept and $\\beta$ from the regression results."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"intercept = results.params['const']\n",
"beta = results.params['t']"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4466.666666666667"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"intercept"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"799.9999999999992"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"beta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now Predict from Regression Results"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6866.666666666664"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_sales = intercept + beta * 3\n",
"fcst_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's wrap up the regression approach into a function to use it with COGS as well."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6866.666666666664"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def intercept_and_t_beta_from_for_forecast_df(for_fcst_df, series_name):\n",
" \"\"\"\n",
" Calculates intercept and beta of time periods from DataFrame set up for forecasting\n",
" \"\"\"\n",
" model = sm.OLS(for_fcst_df[series_name], sm.add_constant(for_fcst_df['t']), hasconst=True)\n",
" results = model.fit()\n",
" intercept = results.params['const']\n",
" beta = results.params['t']\n",
" return intercept, beta\n",
"\n",
"def predict_from_intercept_beta_and_t(intercept, beta, t):\n",
" \"\"\"\n",
" Predicts value in period t based off regression intercept and beta\n",
" \"\"\"\n",
" return intercept + beta * t\n",
"\n",
"intercept, beta = intercept_and_t_beta_from_for_forecast_df(for_fcst_df, 'Sales')\n",
"fcst_sales = predict_from_intercept_beta_and_t(intercept, beta, 3)\n",
"fcst_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Entire Approach for COGS Using Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's write one more function to do the entire forecast, putting everything together."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $6,867 and for COGS is $2,300\n"
]
}
],
"source": [
"def forecast_trend_reg(df, series_name, t):\n",
" \"\"\"\n",
" Full workflow of forecasting trend via regression\n",
" \"\"\"\n",
" for_fcst_df = for_forecast_df_from_orig_df(df, series_name)\n",
" intercept, beta = intercept_and_t_beta_from_for_forecast_df(for_fcst_df, series_name)\n",
" fcst = predict_from_intercept_beta_and_t(intercept, beta, t)\n",
" return fcst\n",
"\n",
"fcst_sales = forecast_trend_reg(df, 'Sales', 3)\n",
"fcst_cogs = forecast_trend_reg(df, 'Cost of Goods Sold', 3)\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trend Method 2: By CAGR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's work off the `DataFrame` set up for forecasting from the regression approach."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
t
\n",
"
Sales
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
4500
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
5200
\n",
"
\n",
"
\n",
"
2
\n",
"
2
\n",
"
6100
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" t Sales\n",
"0 0 4500\n",
"1 1 5200\n",
"2 2 6100"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = for_forecast_df_from_orig_df(df, 'Sales')\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to calculate $$\\frac{y_T}{y_0}^{\\frac{1}{n}} - 1$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use `.iloc` (integer location) to get the first and last values of sales."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4500"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_0 = for_fcst_df['Sales'].iloc[0]\n",
"y_0"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6100"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_T = for_fcst_df['Sales'].iloc[-1]\n",
"y_T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can get the number of time periods elapsed in a similar way."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n = for_fcst_df['t'].iloc[-1] - for_fcst_df['t'].iloc[0]\n",
"n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now just calculate"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.16428327977153212"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cagr = (y_T / y_0)**(1 / n) - 1\n",
"cagr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now to get the predicted value for period 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's wrap this up into functions."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7102.1280066063455"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def cagr_from_for_forecast_df(for_fcst_df, series_name):\n",
" \"\"\"\n",
" Calculates CAGR from DataFrame set up for forecasting\n",
" \"\"\"\n",
" y_0 = for_fcst_df[series_name].iloc[0]\n",
" y_T = for_fcst_df[series_name].iloc[-1]\n",
" n = for_fcst_df['t'].iloc[-1] - for_fcst_df['t'].iloc[0]\n",
" cagr = (y_T / y_0)**(1 / n) - 1\n",
" return cagr\n",
"\n",
"\n",
"def predict_from_for_forecast_df_and_cagr(for_fcst_df, series_name, cagr, t):\n",
" \"\"\"\n",
" Forecast value from DataFrame set up for forecasting and calculated CAGR\n",
" \"\"\"\n",
" y_T = for_fcst_df[series_name].iloc[-1]\n",
" n = for_fcst_df['t'].iloc[-1] - for_fcst_df['t'].iloc[0]\n",
" \n",
" future_nper = t - n\n",
" fcst = y_T * (1 + cagr)**future_nper\n",
" return fcst\n",
"\n",
"\n",
"cagr = cagr_from_for_forecast_df(for_fcst_df, 'Sales')\n",
"fcst_sales = predict_from_for_forecast_df_and_cagr(for_fcst_df, 'Sales', cagr, 3)\n",
"fcst_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's put the entire approach in one function."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $7,102 and for COGS is $2,391\n"
]
}
],
"source": [
"def forecast_trend_cagr(df, series_name, t):\n",
" \"\"\"\n",
" Full workflow of forecasting trend via CAGR\n",
" \"\"\"\n",
" for_fcst_df = for_forecast_df_from_orig_df(df, series_name)\n",
" cagr = cagr_from_for_forecast_df(for_fcst_df, series_name)\n",
" fcst = predict_from_for_forecast_df_and_cagr(for_fcst_df, series_name, cagr, t)\n",
" return fcst\n",
"\n",
"fcst_sales = forecast_trend_cagr(df, 'Sales', 3)\n",
"fcst_cogs = forecast_trend_cagr(df, 'Cost of Goods Sold', 3)\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting as a %"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can estimate COGS as a percentage of sales. To do this, we must first forecast sales, then forecast the percentage of sales, then combine the two. We already have a sales forecast from the last section, so let's keep that. Next is forecasting the percentage of sales. To do this we must first calculate the historical percentage of sales."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
2017-12-31
\n",
"
2018-12-31
\n",
"
2019-12-31
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Sales
\n",
"
4500.000000
\n",
"
5200.000000
\n",
"
6100.000000
\n",
"
\n",
"
\n",
"
Cost of Goods Sold
\n",
"
1200.000000
\n",
"
1700.000000
\n",
"
1900.000000
\n",
"
\n",
"
\n",
"
COGS % Sales
\n",
"
0.266667
\n",
"
0.326923
\n",
"
0.311475
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 2017-12-31 2018-12-31 2019-12-31\n",
"Sales 4500.000000 5200.000000 6100.000000\n",
"Cost of Goods Sold 1200.000000 1700.000000 1900.000000\n",
"COGS % Sales 0.266667 0.326923 0.311475"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['COGS % Sales'] = df.loc['Cost of Goods Sold'] / df.loc['Sales']\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can forecast this by any of the available methods."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.3464971276446685"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_cogs_pct_sales = forecast_trend_reg(df, 'COGS % Sales', 3)\n",
"fcst_cogs_pct_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now combine with the existing sales forecast."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $7,102 and for COGS is $2,461\n"
]
}
],
"source": [
"fcst_cogs = fcst_sales * fcst_cogs_pct_sales\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## All the Approaches, Together\n",
"\n",
"There is a dizzying array of forecast options, even only considering simple forecast methods. Here is a quick overview of the approaches."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales (average) is $5,267 and for COGS (average) is $1,600\n",
"The forecasted value for sales (average) is $5,267 and for COGS (% of Sales, average) is $1,589\n",
"The forecasted value for sales (average) is $5,267 and for COGS (recent) is $1,900\n",
"The forecasted value for sales (average) is $5,267 and for COGS (% of Sales, recent) is $1,640\n",
"The forecasted value for sales (average) is $5,267 and for COGS (trend reg) is $2,300\n",
"The forecasted value for sales (average) is $5,267 and for COGS (% of Sales, trend reg) is $1,825\n",
"The forecasted value for sales (average) is $5,267 and for COGS (trend cagr) is $2,391\n",
"The forecasted value for sales (average) is $5,267 and for COGS (% of Sales, trend cagr) is $1,773\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (average) is $1,600\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (% of Sales, average) is $1,840\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (recent) is $1,900\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (% of Sales, recent) is $1,900\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (trend reg) is $2,300\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (% of Sales, trend reg) is $2,114\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (trend cagr) is $2,391\n",
"The forecasted value for sales (recent) is $6,100 and for COGS (% of Sales, trend cagr) is $2,053\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (average) is $1,600\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (% of Sales, average) is $2,072\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (recent) is $1,900\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (% of Sales, recent) is $2,139\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (trend reg) is $2,300\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (% of Sales, trend reg) is $2,379\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (trend cagr) is $2,391\n",
"The forecasted value for sales (trend reg) is $6,867 and for COGS (% of Sales, trend cagr) is $2,312\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (average) is $1,600\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (% of Sales, average) is $2,143\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (recent) is $1,900\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (% of Sales, recent) is $2,212\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (trend reg) is $2,300\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (% of Sales, trend reg) is $2,461\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (trend cagr) is $2,391\n",
"The forecasted value for sales (trend cagr) is $7,102 and for COGS (% of Sales, trend cagr) is $2,391\n"
]
}
],
"source": [
"def forecast_by_method(df, series_name, method, t):\n",
" if method == 'average':\n",
" return df.loc[series_name].mean()\n",
" elif method == 'recent':\n",
" last_date = df.columns.max()\n",
" return df.loc[series_name][last_date]\n",
" elif method == 'trend reg':\n",
" return forecast_trend_reg(df, series_name, t)\n",
" elif method == 'trend cagr':\n",
" return forecast_trend_cagr(df, series_name, t)\n",
"\n",
"methods = [\n",
" 'average',\n",
" 'recent',\n",
" 'trend reg',\n",
" 'trend cagr'\n",
"]\n",
"\n",
"t = 3\n",
"\n",
"cogs_forecasts = []\n",
"for sales_method in methods:\n",
" fcst_sales = forecast_by_method(df, 'Sales', sales_method, t)\n",
" for cogs_method in methods:\n",
" # Handle levels for COGS\n",
" fcst_cogs = forecast_by_method(df, 'Cost of Goods Sold', cogs_method, t)\n",
" cogs_forecasts.append(fcst_cogs)\n",
" print(f'The forecasted value for sales ({sales_method}) is ${fcst_sales:,.0f} and for COGS ({cogs_method}) is ${fcst_cogs:,.0f}')\n",
" # Handle % of sales for COGS\n",
" fcst_cogs_pct = forecast_by_method(df, 'COGS % Sales', cogs_method, t)\n",
" fcst_cogs = fcst_cogs_pct * fcst_sales\n",
" cogs_forecasts.append(fcst_cogs)\n",
" print(f'The forecasted value for sales ({sales_method}) is ${fcst_sales:,.0f} and for COGS (% of Sales, {cogs_method}) is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAANgElEQVR4nO3df6jd9X3H8edriZaCTiO5bi6JvTJS0G5ddXcxTMa6FmOso9k/Ax1ocGNhJQ4tbmu00FClkHXDMVknBLxUwVUcui3MjDQrbsU/ormKPxoz68Xaepts3hKxgp023Xt/nK9wjPfHudf7I8nn+YBLvvd9Pueez4HwPCffc05uqgpJUht+brk3IElaOkZfkhpi9CWpIUZfkhpi9CWpISuXewMzWb16dQ0PDy/3NiTplPLUU0/9qKqGprrspI7+8PAwY2Njy70NSTqlJPn+dJd5ekeSGmL0JakhRl+SGmL0JakhRl+SGmL0JakhRl+SGmL0JakhJ/WHs6SlkmRJbsffX6Hl5jN9iV6M5/r1kS/865yvIy03oy9JDTH6ktQQoy9JDTH6ktQQoy9JDTH6ktQQoy9JDTH6ktQQoy9JDZk1+knWJXksyeEkh5LcfMLlf5akkqzuvk+Su5OMJ3kuyWV9a7cmean72rrwd0eSNJNB/u+d48CtVfV0krOBp5Lsr6oXkqwDrgR+0Lf+amB993U5cA9weZLzgJ3ACFDdz9lTVa8v4P2RJM1g1mf6VXW0qp7ujt8EDgNruov/BvgLehF/1xbg/uo5AJyb5ALgKmB/VR3rQr8f2Lxwd0WSNJs5ndNPMgxcCjyR5LPAD6vq2ROWrQFe7ft+optNNz/xNrYlGUsyNjk5OZftSZJmMXD0k5wFPAzcQu+UzxeBL021dIpZzTB/76Bqd1WNVNXI0NDQoNuTJA1goOgnOYNe8B+oqkeAXwYuAp5N8gqwFng6yS/Sewa/ru/qa4EjM8wlSUtkkHfvBLgXOFxVdwFU1fNVdX5VDVfVML2gX1ZV/w3sAW7o3sWzEXijqo4C+4BNSVYlWQVs6maSpCUyyLt3rgCuB55P8kw3u72q9k6zfi/wGWAceAu4EaCqjiW5EzjYrbujqo7Ne+eSpDmbNfpV9ThTn4/vXzPcd1zA9mnWjQKjc9uiJGmh+IlcSWqI0Zekhhh9SWqI0Zekhhh9SWqI0Zekhhh9SWqI0Zekhhh9SWqI0Zekhhh9SWqI0Zekhhh9SWqI0Zekhhh9SWqI0Zekhgzym7OkU86vffmbvPGTny767QzveHRRf/45Hz6DZ3duWtTbUFuMvk5Lb/zkp7yy65rl3sYHttgPKmqPp3ckqSFGX5IaYvQlqSFGX5IaYvQlqSFGX5IaYvQlqSFGX5IaYvQlqSFGX5IaYvQlqSFGX5IaYvQlqSFGX5IaYvQlqSFGX5IaYvQlqSFGX5IaMmv0k6xL8liSw0kOJbm5m/9Vkv9K8lySf0pybt91bksynuTFJFf1zTd3s/EkOxbnLkmSpjPIM/3jwK1VdTGwEdie5BJgP/ArVfVx4LvAbQDdZdcCHwM2A3+fZEWSFcDXgKuBS4DrurWSpCUya/Sr6mhVPd0dvwkcBtZU1Ter6ni37ACwtjveAjxYVW9X1feAcWBD9zVeVS9X1TvAg91aSdISmdM5/STDwKXAEydc9IfAv3XHa4BX+y6b6GbTzU+8jW1JxpKMTU5OzmV7kqRZDBz9JGcBDwO3VNWP++ZfpHcK6IF3R1NcvWaYv3dQtbuqRqpqZGhoaNDtSZIGsHKQRUnOoBf8B6rqkb75VuB3gU9X1bsBnwDW9V19LXCkO55uLklaAoO8eyfAvcDhqrqrb74Z+ALw2ap6q+8qe4Brk3woyUXAeuBJ4CCwPslFSc6k92LvnoW7K5Kk2QzyTP8K4Hrg+STPdLPbgbuBDwH7e48LHKiqP6mqQ0keAl6gd9pne1X9DCDJTcA+YAUwWlWHFvTeSJJmNGv0q+pxpj4fv3eG63wF+MoU870zXU+StLj8RK4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNWSQX4wunXLOvngHv3rfjuXexgd29sUA1yz3NnQaMfo6Lb15eBev7Dr1Yzm849Hl3oJOM57ekaSGGH1JaojRl6SGGH1JaojRl6SGGH1JaojRl6SGGH1JaojRl6SGGH1JaojRl6SGGH1JaojRl6SGGH1JaojRl6SGGH1Jasis0U+yLsljSQ4nOZTk5m5+XpL9SV7q/lzVzZPk7iTjSZ5Lclnfz9rarX8pydbFu1uSpKkM8kz/OHBrVV0MbAS2J7kE2AF8q6rWA9/qvge4GljffW0D7oHegwSwE7gc2ADsfPeBQpK0NGaNflUdraqnu+M3gcPAGmALcF+37D7g97rjLcD91XMAODfJBcBVwP6qOlZVrwP7gc0Lem8kSTOa0zn9JMPApcATwC9U1VHoPTAA53fL1gCv9l1toptNNz/xNrYlGUsyNjk5OZftSZJmMXD0k5wFPAzcUlU/nmnpFLOaYf7eQdXuqhqpqpGhoaFBtydJGsBA0U9yBr3gP1BVj3Tj/+lO29D9+Vo3nwDW9V19LXBkhrkkaYkM8u6dAPcCh6vqrr6L9gDvvgNnK/AvffMbunfxbATe6E7/7AM2JVnVvYC7qZtJkpbIygHWXAFcDzyf5JludjuwC3goyR8BPwB+v7tsL/AZYBx4C7gRoKqOJbkTONitu6Oqji3IvZAkDWTW6FfV40x9Ph7g01OsL2D7ND9rFBidywYlSQvHT+RKUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkOMviQ1xOhLUkNWLvcGpMUyvOPR5d7CB3bOh89Y7i3oNGP0dVp6Zdc1i34bwzseXZLbkRaSp3ckqSFGX5IaYvQlqSGzRj/JaJLXknynb/aJJAeSPJNkLMmGbp4kdycZT/Jcksv6rrM1yUvd19bFuTuSpJkM8kz/68DmE2ZfBb5cVZ8AvtR9D3A1sL772gbcA5DkPGAncDmwAdiZZNUH3bwkaW5mjX5VfRs4duIY+Pnu+BzgSHe8Bbi/eg4A5ya5ALgK2F9Vx6rqdWA/738gkSQtsvm+ZfMWYF+Sv6b3wPGb3XwN8GrfuoluNt38fZJso/evBC688MJ5bk+SNJX5vpD7OeDzVbUO+DxwbzfPFGtrhvn7h1W7q2qkqkaGhobmuT1J0lTmG/2twCPd8T/SO08PvWfw6/rWraV36me6uSRpCc03+keA3+6OPwW81B3vAW7o3sWzEXijqo4C+4BNSVZ1L+Bu6maSpCU06zn9JN8APgmsTjJB7104fwz8bZKVwP/SnYMH9gKfAcaBt4AbAarqWJI7gYPdujuq6sQXhyVJi2zW6FfVddNc9OtTrC1g+zQ/ZxQYndPuJEkLyk/kSlJDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNcToS1JDjL4kNWTW6CcZTfJaku+cMP/TJC8mOZTkq33z25KMd5dd1Tff3M3Gk+xY2LshSRrEygHWfB34O+D+dwdJfgfYAny8qt5Ocn43vwS4FvgY8EvAvyf5aHe1rwFXAhPAwSR7quqFhbojkqTZzRr9qvp2kuETxp8DdlXV292a17r5FuDBbv69JOPAhu6y8ap6GSDJg91aoy9JS2i+5/Q/CvxWkieS/GeS3+jma4BX+9ZNdLPp5u+TZFuSsSRjk5OT89yeJGkq843+SmAVsBH4c+ChJAEyxdqaYf7+YdXuqhqpqpGhoaF5bk+SNJVBzulPZQJ4pKoKeDLJ/wGru/m6vnVrgSPd8XRzSdISmW/0/xn4FPAf3Qu1ZwI/AvYA/5DkLnov5K4HnqT3TH99kouAH9J7sfcPPuDepQXT+4fqPK73l3Nb33ueJC2fWaOf5BvAJ4HVSSaAncAoMNq9jfMdYGv3rP9QkofovUB7HNheVT/rfs5NwD5gBTBaVYcW4f5I82KM1YqczH/ZR0ZGamxsbLm3IUmnlCRPVdXIVJf5iVxJaojRl6SGGH1JaojRl6SGGH1JaojRl6SGGH1JashJ/T79JJPA95d7H9I0VtP7JLp0svlIVU35n5ed1NGXTmZJxqb7AIx0svL0jiQ1xOhLUkOMvjR/u5d7A9JceU5fkhriM31JaojRl6SGGH1pjpJsTvJikvEkO5Z7P9JceE5fmoMkK4DvAlfS+53QB4HrquqFZd2YNCCf6UtzswEYr6qXq+od4EFgyzLvSRqY0ZfmZg3wat/3E91MOiUYfWluMsXMc6Q6ZRh9aW4mgHV9368FjizTXqQ5M/rS3BwE1ie5KMmZwLXAnmXekzSwlcu9AelUUlXHk9wE7ANWAKNVdWiZtyUNzLdsSlJDPL0jSQ0x+pLUEKMvSQ0x+pLUEKMvSQ0x+pLUEKMvSQ35f3mblt1/13PbAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"pd.DataFrame(cogs_forecasts).plot.box()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}